Short Communication Classification of Metabolites with Kernel-Partial Least Squares (K-PLS)
ثبت نشده
چکیده
Numerous experimental and computational approaches have been developed to predict human drug metabolism. Since databases of human drug metabolism information are widely available, these can be used to train computational algorithms and generate predictive approaches. In turn, they may be used to assist in the identification of possible metabolites from a large number of molecules in drug discovery based on molecular structure alone. In the current study we have used a commercially available database (MetaDrug) and extracted a fraction of the human drug metabolism data. These data were used along with augmented atom descriptors in a predictive machine learning model, kernel-partial least squares (K-PLS). A total of 317 molecules, including parent drugs and their primary and secondary (sequential) metabolites, were used to build these models corresponding to individual metabolism rules, representing the formation of discrete metabolites, e.g., N-dealkylation. Each model was internally validated to assess the capability to classify other molecules that were left out. Using receiver operator curve statistics models for N-dealkylation, Odealkylation, aromatic hydroxylation, aliphatic hydroxylation, Oglucuronidation, and O-sulfation gave area under the curve values from 0.75 to 0.84 and were able to predict between 61 and 79% active molecules upon leave-one-out testing. This preliminary study indicates that K-PLS and possibly other similar machine learning methods (such as support vector machines) can be applied to predicting human drug metabolite formation in a classification manner. Improvements can be achieved using considerably larger datasets that contain more positive examples for the less frequently occurring metabolite rules, as well as the external evaluation of novel molecules. With the emphasis now on increasing the efficiency of drug discovery, there is interest in using predictive computational approaches to complement in vitro and in vivo studies. In the area of metabolism prediction, these techniques encompass pharmacophores (Ekins et al., 2001), quantitative structure-activity relationships (QSARs) (Shen et al., 2003; Balakin et al., 2004), electronic models (Korzekwa et al., 2004), and commercial drug metabolism databases (Borodina et al., 2004), as well as other methods that have been comprehensively reviewed elsewhere (de Graaf et al., 2005; Ekins et al., 2005a; de Groot, 2006). Some approaches have combined metabolite data and rules for suggesting metabolic pathways across multiple species (Erhardt, 2003). Such databases may also be useful for calculating the probability for a given metabolic reaction (Boyer and Zamora, 2002) to then indicate potential metabolites and the sites of metabolism using statistical or algorithmic approaches (Borodina et al., 2004). Although these types of comprehensive databases generally enable numerous search options to retrieve molecule structures and published information, the predictive capabilities seem limited at present (Wishart et al., 2006). A major limitation is that they are unlikely to have a complete dataset of reactions and molecular structures to extrapolate for a new molecule. In turn, the user is reliant on the quality of the published in vitro or in vivo data which, in many cases, may predate modern analytical methods, such that older published metabolic pathways may be incomplete. In reality, such database approaches provide knowledge of most published data and are perhaps limited to interpolation. The combination of different approaches to drug metabolite prediction may balance the strengths and weaknesses of each approach, and several commercial methods are now pursuing this direction. MetaDrug represents one such method, combining a manually annotated database of human drug metabolism information including xenobiotic reactions, enzyme substrates, and enzyme inhibitors with kinetic data (Ekins et al., 2005b, 2006). This database has enabled the generation of rules for predicting likely metabolic reactions. The parent molecule and metabolites may then be scored through integrated QSAR models and rules for molecule reactivity before visualizing molecules as nodes on a network diagram (Ekins et al., 2005b, 2006). Such rule-based metabolite predictions indicate that it is possible to generate many more metabolites than have been identified in the literature, which may make the methods less useful (Ekins et al., 2006). We are therefore investigating approaches to limit the metabolites to those that are most likely. Recently, a number of machine learning approaches including support vector machines and kernelpartial least squares (K-PLS) (Rosipal and Trejo, 2001) have been implemented in a single software package (Analyze/StripMiner), and The development of MetaDrug was supported by National Institutes of Health Grants 1-R43-GM069124-01 and 2-R44-GM069124-02 “In silico Assessment of Drug Metabolism and Toxicity”. Competing Financial Interest: MetaDrug is a proprietary tool developed and licensed by GeneGo, Inc. Article, publication date, and citation information can be found at http://dmd.aspetjournals.org. doi:10.1124/dmd.106.013185. ABBREVIATIONS: QSAR, quantitative structure-activity relationship; K-PLS, kernel-partial least squares; AUC, area under the curve. 0090-9556/07/3503-325–327$20.00 DRUG METABOLISM AND DISPOSITION Vol. 35, No. 3 Copyright © 2007 by The American Society for Pharmacology and Experimental Therapeutics 13185/3177405 DMD 35:325–327, 2007 Printed in U.S.A.
منابع مشابه
Classification of metabolites with kernel-partial least squares (K-PLS).
Numerous experimental and computational approaches have been developed to predict human drug metabolism. Since databases of human drug metabolism information are widely available, these can be used to train computational algorithms and generate predictive approaches. In turn, they may be used to assist in the identification of possible metabolites from a large number of molecules in drug discov...
متن کاملAn Optimization Perspective on Kernel Partial Least Squares Regression
This work provides a novel derivation based on optimization for the partial least squares (PLS) algorithm for linear regression and the kernel partial least squares (K-PLS) algorithm for nonlinear regression. This derivation makes the PLS algorithm, popularly and successfully used for chemometrics applications, more accessible to machine learning researchers. The work introduces Direct K-PLS, a...
متن کاملSparse Kernel Orthonormalized PLS for feature extraction in large data sets
In this paper we are presenting a novel multivariate analysis method for large scale problems. Our scheme is based on a novel kernel orthonormalized partial least squares (PLS) variant for feature extraction, imposing sparsity constrains in the solution to improve scalability. The algorithm is tested on a benchmark of UCI data sets, and on the analysis of integrated short-time music features fo...
متن کاملAn In Silico Method for Screening Nicotine Derivatives as Cytochrome P450 2A6 Selective Inhibitors Based on Kernel Partial Least Squares
Nicotine and a variety of other drugs and toxins are metabolized by cytochrome P450 (CYP) 2A6. The aim of the present study was to build a quantitative structure-activity relationship (QSAR) model to predict the activities of nicotine analogues on CYP2A6. Kernel partial least squares (K-PLS) regression was employed with the electro-topological descriptors to build the computational models. Both...
متن کاملKernel PLS-SVC for Linear and Nonlinear Classification
A new method for classification is proposed. This is based on kernel orthonormalized partial least squares (PLS) dimensionality reduction of the original data space followed by a support vector classifier. Unlike principal component analysis (PCA), which has previously served as a dimension reduction step for discrimination problems, orthonormalized PLS is closely related to Fisher’s approach t...
متن کامل